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Abstract 

We consider the probability hierarchy for Popperian FINite learning and study 
the general properties of this hierarchy. We prove that the probability hierarchy is 
decidable, i.e. there exists an algorithm that receives p\ and P2 and answers whether 
PFIN-type learning with the probability of success p\ is equivalent to PFIN-type learn- 
ing with the probability of success p2- 

To prove our result, we analyze the topological structure of the probability hierar- 
chy. We prove that it is well-ordered in descending ordering and order-equivalent to 
ordinal eo- This shows that the structure of the hierarchy is very complicated. 

Using similar methods, we also prove that, for PFIN-type learning, team learning 
and probabilistic learning are of the same power. 

1 Introduction 



Inductive inference is a branch of theoretical computer science that studies the process of 
learning in a recursion-theoretic framework [16, 6, 25]. Within inductive inference, there has 
been much work on team learning (see surveys in [2, 18, 31]). 

Probabilistic learning is closely related to team learning. Any team of machines can be 
simulated by a single probabilistic machine with the same success ratio. The simulation of 
a probabilistic machine by a team of deterministic machines is often possible as well. 

In this paper, we consider finite learning of total recursive functions (abbreviated as FIN). 
The object to be learned is a total recursive function /. A learning machine reads the values 
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of the function /(0), /(l), • • • and produces a program computing / after having seen a finite 
initial segment of /. The learning machine is not allowed to change the program later. 

FIN is supposed to be one of the simplest learning paradigms. However, if we consider 
probabilistic and team learning, the situation becomes very complex. Probabilistic FIN-type 
learning has been studied since 1979 but we are still far from the complete understanding of 
this area. 

The investigation of probabilistic FINite learning was started by Freivalds[14]. He gave a 
complete description of the learning capabilities for probabilistic machines with probabilities 
of success above \. These results were extended to team learning by Daley et. al.[32, 13]. 

The further progress was very difficult. Daley, Kalyanasundaram and Velauthapillai[ll] 
determined the capabilities for probabilistic learners with success probabilities in the interval 
[f§, §]■ Later, Daley and Kalyanasundaram[10] extended that to the interval [||, |]. Proofs 
became more and more complicated. (The full version of [10] is more than 100 pages long.) 

PFIN (Popperian FIN)-type learning is a simplified version of FIN-type learning. In 
a PFIN-type learning, a learning machine is allowed to output only programs computing 
total recursive functions. Many properties of probabilistic and team PFIN-type learning are 
similar to FIN-type learning. Yet, PFIN-type learning is simpler and easier to analyse than 
unrestricted FIN-type learning. 

Daley, Kalyanasundaram and Velauthapillai[9, 12] determined the capabilities of proba- 
bilistic PFIN-type learners in the interval [| , |]. However, even for PFIN-type learning, the 
situation becomes more and more complicated for smaller probabilities of success. 

In this paper, we suggest another approach to PFIN-type and FIN-type learning. Instead 
of trying to determine the exact points at which the learning capabilities are different (either 
single points or sequences of points generated by a formula), we investigate global properties 
of the probability structure. 

Our main result is that the probability hierarchy for PFIN-type learning is well-ordered 
in a decreasing ordering and has a constructive description similar to systems of notations for 
constructive ordinals. We use this result to construct a decision algorithm for the probability 
hierarchy. Given two numbers P\,Pi £ [0,1], the decision algorithm answers whether the 
learning with probability p 1 is equivalent to the learning with probability p 2 - Also, we 
construct a universal simulation algorithm receiving 

• Pi,P2 £ [0, 1] such that PFIN-learning with these probabilities is equivalent and 

• PFIN-learning machine M with the probability of success p\ 

and transforming M into machine M' with the probability of success p 2 - 

All of these results make heavy use of the well-ordering and the system of notations. To 
our knowledge, this is the first application of well-orderings to a problem of this character. 
(They have been used in computational learning theory [1, 15], but for entirely different 
purposes.) 
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We also determine the exact ordering type of the probability hierarchy. It is order- 
isomorphic to e , a quite large ordinal. 1 The part of the hierarchy investigated before ([|, 1]) 
is order-isomorphic to the ordinal 3a; and is very simple compared to the entire probability 
hierarchy. Thus, we can conclude that finding a more explicit description for the whole 
hierarchy is unlikely. (The previous research shows that, even for segments like [|, 1] with a 
simple topological structure, this task is difficult because of irregularities in the hierarchy [9].) 

Our results also imply that any probabilistic PFIN-type learning machine can be simu- 
lated by a team of deterministic machines with the same success ratio. 

2 Technical preliminaries 

2.1 Notations 

We use the standard recursion theoretic notation[29]. 

IN denotes {0,1,...}, the set of natural numbers. 1N + denotes {1,2,...}, the set of 
positive natural numbers, Q denotes the set of rational numbers and 1R the set of real 
numbers. C and C denote a subset and a proper subset, respectively. 

Let ip denote an arbitrary fixed acceptable programming system (a.k.a. Godel numbering) 
of all partial recursive functions[28, 29, 23]. ipi denotes the i th program in system ip. 

2.2 Finite learning of functions 

A learning machine is an algorithmic device which reads values of a recursive function /: 
/(0), /(l), • • •• Having seen finitely many values of the function it can output a conjec- 
ture. The conjecture is a program in some fixed acceptable programming system. Only one 
conjecture is allowed, i.e. the learning machine cannot change its conjecture later. 

A learning machine M FIN-learns a function / if, receiving /(0), f(l), ... as the input, 
it produces a program computing /. M FIN-learns a set of functions U if it FIN-learns any 
/ e U. A set of functions U is FIN -learnable if there exists a learning machine that learns 
U. The collection of all FIN-learnable sets is denoted FIN. 

PFIN-learning is a restricted form of FIN-learning. A learning machine M PFIN-Zearns U 
if it FIN-learns U and all conjectures (even incorrect ones) of M on all inputs are programs 
computing total recursive functions. The collection of all PFIN-learnable sets is denoted 
PFIN. 

2.3 Probabilistic and team learning 

Scientific discoveries are rarely done by one person. Usually, a discovery is the result of 
collective effort. In the area of computational learning theory, this observation has inspired 

*It is known that e is isomorphic to the set of all expressions possible in first-order arithmetic. 
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the research on team learning. 

A team is just a set of learning machines: M = {Mi, . . . , M s }. The team M [r, s]FIN- 
learns a set of functions U if, for every / e U, at least r of Mi, . . ., M s FIN- learn /. The 
collection of all [r, s]FIN-learnable sets is denoted [r, s]FIN. 

We also consider learning by probabilistic machines. A probabilistic machine has an 
access to a fair coin and its output depends on both input and the outcomes of coin flips. 

Let M be a probabilistic learning machine. M FIN(p)-learns (FIN- learns with probability 
p) a set of functions U if, for any function / e U, the probability that M outputs a program 
computing /, given /(0), /(l), • • • as the input, is at least p. FIN(p) denotes the collection 
of all FIN(p)-learnable sets. 

Probabilistic and team PFIN-learning is defined by adding a requirement that all con- 
jectures output by the probabilistic machine or any machine in the team must be programs 
computing total recursive functions. 

Definition 1 The probability hierarchy for FIN is the set A C IR n [0, 1] such that 

1. For any two different pi,p 2 € A, 

FIN( Pl ) ^ FIN(p2> 

i.e., learning with probability of success pi is not equivalent to learning with probability 
of success p 2 . 

2. If x e A, x < p and [x,p[ does not contain any points belonging to A, then 

FIN(x) = FIN(p). 

Essentially, the probability hierarchy is the set of those probabilities at which the learning 
capabilities of probabilistic machines are different. 

The probability hierarchy for PFIN is defined similarly. 

2.4 Well-orderings and ordinals 

A linear ordering is a well-ordering if it does not contain infinite descending sequences. 
Ordinals[30] are standard representations of well-orderings. 

The ordinal represents the ordering type of the empty set, the ordinal 1 represents 
the ordering type of any 1 element set, the ordinal 2 represents the ordering type of any 2 
element set and so on. The ordinal uo represents the ordering type of the set {0, 1,2, . . .}. 
The ordinal w + 1 represents the ordering type of {0, 1, 2, . . .} followed by an element uj. The 
ordinal 2u represents the ordering type {0, 1, 2, . . .} followed by {iv,u + l,u + 2, . . .}. Greater 
ordinals can be defined similarly [30]. We use arithmetic operations on ordinals defined in 
two different ways. 
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Definition 2 [22] Let A and B be two disjoint sets, a be the ordering type of A and (3 be 
the ordering type of B. 

1. a + (3 is the ordering type of AU B ordered so that x < y for any x G A, y G B and 
order is the same within A and B. 

2. a(3 is the ordering type of A x B ordered so that (xi,yi) < (#2,2/2) iff x i < x t or 
xi = x 2 and y l <y 2 . 

We note that both the sum and the product of ordinals are non-commutative. For 
example, l+u> — + 

Definition 3 [22] a — [3 (the difference of a and (3) is an ordinal 7 such that a = (5 + 7. 

a— 13 always exists and is unique [22]. We also use the natural sum and the natural product 
of ordinals. These operations use the representation of ordinals as exponential polynomials. 
In this paper, we consider only ordinals which are less than or equal to 

e = lim(u;, oj u , itf ,...). 
Any ordinal a < eo can be uniquely expressed in the form 

a = c ± uj ai + . . . + c n u an 
where a,\ > a 2 > ■ ■ ■ > ct n are smaller ordinals and Ci, c 2 , . . ., c n G IN. 
Definition 4 [22] Let 

a = Cl uj ai + . . . + c n uj a " 
(3 = d lU j ai + . . . + d n u an 

1. The natural sum of a and {3 is 

a(+)(3 = (ci + V ai + . . . + (cn + d n )oo an . 

2. a(-)f3, the natural product of a and (3 is the product of base uo representations as 
polynomials. oj ai {-)uj aj = u ai ^ aj and a{-)(3 is the natural sum of CidjOJ ai ^ a] for all 

Natural sum and natural product are commutative. They can be used to bound the ordering 
type of unions. 

Theorem 1 Let Ai, . . . , A s be arbitrary subsets of a well-ordered set A, a.\, . . ., a s be the 
ordering types of A\, . . ., A s and a be the ordering type of A 1 U . . . U A 8 . Then, 

a < a 1 (+)a 2 (+) ■ ■ ■ (+)a s . 
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The difference between this theorem and Definition 2 is that Definition 2 requires x < y for 
all x G A, y G B but Theorem 1 has no such requirement. Next, we give a similar result for 
the natural product. 

Theorem 2 Let A±, . . . , A s and A be well-ordered sets with ordering types ot\, . . ., a s and 
a, respectively. Assume that f : A± x A 2 x . . . x A s — > A is a strictly increasing function 
onto A, i.e. 

/(«!, . . . , Q!j_i, ttj, Q! i+ i, . . . , Q! s ) < /(«!, . . . , Q!j_i, a i+1 , . . . , OL 8 ) 

for alii G {1, . . . , s} and ai < a\. Then 

a < a 1 (-)a 2 {-) ■ ■ ■ {-)a s . 

Both Theorem 1 and 2 will be used in section 4. We will also use the transfinite induction, 
a generalization of the usual mathematical induction. 

Theorem 3 [22, Principle of transfinite induction] Let A be a well-ordered set and P(x) be 
a predicate. If 

1. P{x) is true when x is the smallest element of A, and 

2. P(y) for all y G A which are smaller than x implies P(x), 
then P(x) for all x G A. 

2.5 Systems of notations 

In this paper we use subsets of Qfl [0, 1] that are well-ordered in decreasing ordering. A subset 
of (E) is well-ordered in decreasing ordering if it does not contain an infinite monotonically 
increasing sequence. 

Church and Kleene[8, 21] introduced systems of notations for constructive ordinals. Intu- 
itively, a system of notations is a way of assigning notations to ordinals which satisfies certain 
constraints and allows to extract certain information about the ordinal from its notation. 
Below, we adapt the definition by Church and Kleene [8, 21] to well-ordered subsets of Q. 

Let A be a subset of Q which is well-ordered in decreasing ordering. All elements of A 
can be classified as follows: 

1. The greatest element of the set A. We call it the maximal element. 

2. Elements x which have an immediately preceding element in decreasing ordering (i.e. 
an element y such that x < y and [x,y] does not contain any points belonging to A). 
Such elements are called successor elements. 
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3. All other elements x G A. They are called limit elements. 

Definition 5 A system of notations for A is a tuple of functions (ks,Ps,qs) : Q — > IN such 
that 

1. ks(x) is equal to 

(a) 0, if x is the maximal element; 

(b) 1, if x is a successor element; 

(c) 2, if x is a limit element; 

(d) 3, ifx£A; 

2. If ks(x) = 1, then ps(x) is defined and it is the element immediately preceding x in 
descending ordering. 

3. If ks(x) = 2, then qs(x) is defined and it is a program computing a monotonically 
decreasing sequence of elements of the set A converging to x. 

Systems of notations are convenient for manipulating well-ordered sets in our proofs. 
Possibly, a system of notation is the most appropriate way of describing the probability 
hierarchy for PFIN. The structure of this hierarchy is quite complicated (Section 4) and it 
seems unlikely that more explicit descriptions exist. 

Below, we give a useful property of systems of notations. 

Lemma 1 Let A C (Q be a set which is well-ordered in descending ordering and has a system 
of notations S . Let f\{p) be the largest number in A such that f\{p) < p and f^ip) be the 
smallest number in A such that p < f^ip)- Then f\ and f 2 are computable functions. 

Proof. f\ and fi are computed by the algorithm below: 

1. Set x equal to an arbitrary element of A smaller than p. 

2. (a) If x = p, output: fi(p) = h{p) = x. Stop. 

(b) If x is a successor element and ps(x) > p, then output: fi(p) = x and f'2{p) = 
p s (x). Stop. 

(c) If a; is a successor element and ps(x) < p, set x = ps(x). 

(d) If x is a limit element and x ^ p, take the sequence 

Vqs(x)( Q ),Vq S (x)( l ), ■■■■ 

Search for the smallest i satisfying Lp qs ( x ^(i) < p and set x = ip qs M(i). (Such i 
exists because this sequence is monotonically decreasing and converges to x and 
x < p.) 
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Figure 1: The probability hierarchies for EX, FIN and PFIN 



3. Repeat step 2. 

While this algorithm works, x remains less or equal to p. 

From the definition of the system of notations it follows that the values of /i and f 2 
output by the algorithm are correct. It remains to prove that algorithm always outputs 
and f 2 (p). 

For a contradiction, assume that the algorithm does not output fi(p) and f2(p) for some 
p G Q. This can happen only if the algorithm goes into eternal loop, i.e. if Step 2 is executed 
infinitely many times. 

Each execution of Step 2 increases the value of x. Let Xi be the value of x after the 
i th repetition of Step 2. Then xi,X2,x%, ... is an infinite monotonically increasing sequence. 
This contradicts the set A being well-ordered in decreasing order. □ 

2.6 Three examples 

In Figure 1, we show the known parts of probability hierarchies for three learning criteria: 

• EX (learning in the limit, Pitt and Smith [26, 27]), 

• FIN (Freivalds[14], Daley, Kalyanasundaram and Velauthapillaifll]), and 
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• PFIN (Daley, Kalyanasundaram and Velauthapillai[12, 9]). 

We see that these probability hierarchies contain infinite decreasing sequences but none 
of them contains an infinite increasing sequence. Known parts of these hierarchies are well- 
ordered in decreasing ordering. 

We will show that, for PFIN- type learning, the entire hierarchy is well-ordered and will 
use this property to study its properties. 

3 Decidability result 

The outline for this section is as follows. We start with describing a set A in two equivalent 
forms in subsection 3.1. Then, in sections 3.2 and 3.3, we show several technical lemmas 
about the set A, including the equivalence of the two descriptions. Then, we show that A 
is the probability hierarchy for PFIN. The proof of that consists of two parts: diagonaliza- 
tion and simulation. The diagonalization part is shown in subsection 3.4. The simulation 
argument is more complicated. First, in section 3.5, we show that A is well-ordered and has 
a system of notations. Finally, in subsection 3.6 we use these technical results to construct 
a universal simulation argument. Our diagonalization theorem uses methods from Kum- 
mer's paper on PFIN-teams[20] but the simulation part uses new techniques and is far more 
complicated. 

3.1 Description of probability hierarchy 

Our description has two equivalent forms. First, we describe it as a set of solutions to a 
particular optimization problem on trees. 

Similarly to [20], we define trees as finite nonempty subsets of IN* which are closed under 
initial segments. The root of each tree is the empty string e. A vertex u is a child of a 
vertex v if u — vn for some n e IN. Next, we define labelings of trees by positive reals. The 
definition below is equivalent to one in [20], with some minor technical modifications. 

Definition 6 Let < p < q. An (p,q) -labeling of a tree T is a pair of mappings Vi,v 2 : 
T — > IR + such that 

1- v i(e) > P an d 1*2 (e) — 0, 

2. Ifti,...,t s are all direct successors oft, then 2 Ylt=i v 2^i) < + v 2 (t) and v\{ti) + 
v 2 (ti) >pfori = l,...,s, 

3. For each branch the sum of the v\-labels of all of its nodes is at most q. 
2 Definition in [20] incorrectly uses v\{t) instead of fi(t) + ^(t) here. 
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Labelings by natural numbers have an intuitive meaning. Vi{v) + 1*2 (f) is the number of 
machines that have issued a conjecture consistent with the initial segment v. In particular, 
1/2 (v) is the number of machines that have issued such a conjecture on some prefix of v and 
v\ (v) is the number of machines that have output it after seeing the whole segment v. 

Then, the requirements of definition have the following interpretation. v\{t) + ^2 CO > V 
means that, for every segment t in the tree, there must be at least p machines with conjectures 
consistent with t. 

The second requirement, J2t=i vi^i) < vi(t)+is 2 (t) has the following interpretation. z/ 2 (tj) 
is the number of machines which have issued a conjecture consistent with ti after reading a 
prefix of U. A conjecture consistent with U is also consistent with t. A prefix of U could be 
either t or a prefix of t. Since a conjecture can be only consistent with one of segments ti, 
Ylt=i ^(U) must be at most the total number of machines which have issued a conjecture 
consistent with t after reading either t or a prefix of t. The number of such machines is 
i/i(f) + i/ 2 (t)- 

Finally, the third requirement means that the total number of machines that issue con- 
jectures on any branch is at most q. An example of a labeling is shown in Figure 3.1. The 
first number near node is vi(t), the second number is z^CO- 

Labelings with reals have a similar interpretation, with v\ (t) and v 2 (t) being the proba- 
bilities that a probabilistic machine has output a conjecture consistent with t. 

Let pt denotes the largest number p such that there is a (p, 1) labeling of T. (For the 
tree in Figure 3.1, p = 6/11.) Let 

•A = {pt\T is a tree}. 

The second description is algebraic, by a recurrence relation. Let set A' defined by the 
following rules: 

1. 1 E A'; 

2. If pi,P2, ■ ■ ■ ,p s £ A' and p G [0, 1] is a number such that there exist q±, . . . , q s G [0, 1] 
satisfying 

(a) q 1 + q 2 + . . . + q s = p; 

( b ) ^x~ v =Vi for i = l,...,s, 
then p G A'; 

In section 3.3, we will show that both definitions give the same set A = A'. After that, 
we will prove that A is the probability hierarchy for PFIN. 
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3.2 Technical lemmas: algebraic description 

In this subsection, we study the properties of the rule that generates the set A'. The results 
of this subsection are used in various parts of section 3. First, we show that the rule 2 can 
be described without using variables 

Lemma 2 // there exist q 1 , . . . , q s G [0, 11 satisfying q 1 + q 2 + . . . + q s = p and J\_ = pi 

Qi-ri p 

for % — 1, . . . , s, then 

Proof. q .+ x _ v = Pi is equivalent to = ^ + p — 1. Hence, 

s 8 p ( s 1 \ 

p = Qi = +P~ l ) = [^2 — }p + s-p-s, 

i=l i=i Pi \i=l Pi J 

s= (j2—]p+(s-l)p, 

\i=l Pi J 

s 

p= (s-i)+EU(j-y 

□ 

We shall use both forms of the rule 2. The rule with qi is more natural in simulation and 
diagonalization arguments but is less convenient for algebraic manipulations. We also use a 
version of Lemma 3 where equality is replaced by inequality. 

Lemma 3 // there exist qi, . . . , q s G [0, 1] satisfying q\ + q 2 + . . . + q s = p and - . + ^_ p < Pi 
for i — 1, . . . , s, then 

Proof. Similar to the proof of Lemma 2, with < or > instead of = where necessary. □ 

Lemma 2 suggests that the rule 2 can be considered as a function of pi, . . . ,p s . Next 
lemmas show that this function is monotonic and continuous. 

Lemma 4 // 

1. p G A' follows from p\ G A', . . . ,p s G A' by rule 2; 

2. p' G A' follows from p[ G A', . . . ,p' s G A' by rule 2; 

3. p x <pi,...,p s <p' s , 

then p < p' . If Pi < p'i for at least one i, then p < p' . 
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Proof. By Lemma 2 

s 

— r and rj = — 



V = ~, s : r and p = r . 



From pi < p\ it follows that ^ > ^ and 

(-i) + E^>(-i) + Ei 

i=l ^« i=l t>i 

S S , 

P= (*-i) + E?=ii " (*-i) + EJ=i£ =P ' 

If p« < p- for some i, then 1/pj > 1/p^ and all inequalities are strict. □ 

Lemma 5 Let pj = lim^oo pj^ and r = lim^oorj. If, for all i G IN, r\ G A' follows from 
Pi t i G A', . . . ,p s ,i G A' by rule 2, then r G A' follows from pi G A', . . . ,p s G A' by rule 2. 

Proof. 

.s 

r = lim r, = lim 



.-oo [S - 1) + £•=! ^ 



s 



□ 

The last result of this section relates the numbers generated by applications of the rule 
2 to p x G .A', . . ., p s G ^ and -M_ e A\ . . ^ e A'. 

Lemma 6 An application of the rule 2 to x\ G A ' , . . ., x s G A' generates p G A' if and only 
if an application of the rule 2 to G A', . . ., j^- G A' generates G A' . 

Proof. Assume that equation (1) is true for pi — xi, . . ., p s — x s . Then, 



s 



i)+n=i(i + i) («-i)+Ei =1 l ±f 



This is precisely equation (1) for pi = j^-, • • ., p s — i+^p 

The opposite direction (equation (1) for pi = , . . ., p s = implies equation (1) is 
true for pi = xi, . . ., p s — x s ) is similar. □ 
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3.3 Technical lemmas: tree description 

We start by showing that for a tree T and its subtrees Tj, p T and p^ are related similarly 
to rule 2. 

Lemma 7 Let r > and T be a tree with (p,q) -labeling. Then, there is a (pr,qr) -labeling 
for T. 

Proof. We multiply all labels by r and obtain a (pr, gr)-labeling. □ 

Lemma 8 Let t±, . . . ,t s be all direct successors of the root in a tree T and T\, T 2 , . . . , Tg be 
the subtrees with roots t±, t 2 , ■ ■ ■ , t s . Assume there are q±, . . ., q s such that X)|=i ft — P an d 

P = VrMi + 1 - p) 

for i G {1, . . . , s}. Then pr = P- 

Proof. First, we construct a (p, relabeling. Let v{, v\ be a (pt v l)-labeling for Tj. We 
define 

!p, if t = e 

p-qi, if t = U 

(1 + qi — p)v\{t), if t is a descendant of ti 

'0, if t = e 

vi{t) = < qi, if t = U 

(1 + qi — p)ul(t), if t is a descendant of ^ 

Properties 1 and 2 can be checked directly from the definitions of v x and u 2 . 

We prove Property 3. Let u be a direct successor of U. Then, the sum of //{-labels on 
any branch starting at u is at most 1 — pr r (By Property 3 of v\, it is at most 1 for any 
branch starting at ti and v\{ti) > Pt,-) Hence, the sum of //-labels for such a branch is at 
most (qi + 1 — p)(l —pTi)- A branch starting at e consists of e, ti and a branch starting at a 
direct descendant of t^ Hence, the sum of all its i/i-labels is at most 

P + (P - ft) + (ft + 1 - - PtJ = P + 1 - (1 + ft - p) + (qi + 1 - p)(l - PTi) = 

p + 1 - (qi + 1 - p)p Ti = p + l- p = l. 

For a contradiction, assume that there is p' > p and a (p 1 , l)-labeling (u[, u' 2 ) for T. Let 
ft' = ^(^i)- K we restrict ourselves to the subtree Tj and add z/^tj) to z/J(tj), we obtain a 
(p', 1 — p' + g^-labeling for Tj. By Lemma 7, there is a (p'/(l — p' + q[), 1) labeling for Tj. 
Hence, 

p' p p' 

j<Pt % = — — < 



p' + q'i 1 1 - p + qi 1-p + qi 
(l-p' + q'i) > (1-p + qi), 
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p' - q'i <p-qi- 

We consider the sum of these expressions for all i. 

s s s 

(s - iy < s ■ p' - 4 = YSp' - & < Z> - *) = 

i=l i=l i=l 

s 

= s-p-^qi = (s- l)p 

i=i 

and p' < p. Contradiction, proving the lemma. □ 

By Lemma 2, the relation between p T and p^, ■ ■ ., £>t s is also expressed by the equation 
(1). We can now show the equivalence of the two definitions. 

Lemma 9 A = A' . 

Proof. By induction. If p G A' follows from p 1 , . . . ,p s G A' by rule 2 and p^ = Pi for trees 
Tj, we construct a tree T consisting of the root, Ti, . . . , T s and make the roots of Ti, T 2 , . . ., 
T s children of T's root. Then, pt = P (by Lemma 8). Hence, for any p e ^4', there is a tree 
T with pt = P ■ This means .4/ C ^4. 

Similarly, we can show that p T G ^4' for any tree T. □ 

Next, we show that the (pt, l)-labeling of Lemma 8 uses only rational numbers and, 
hence, can be transformed into a labeling that uses only integers. 

Lemma 10 For any tree T , pt G Q. 

Proof. By induction over the depth of T. For a tree consisting of root only, p — 1. 

Otherwise, let ti, . . . , t s be all direct successors of the root in T and T 1; T 2 , . . . , Ts be the 
subtrees with roots t±, t 2 , ■ ■ ■ , t s . The depth of these subtrees is smaller than the depth of T. 
Hence, all p^ are rationals. Equation (1) implies that pr is rational, too. □ 

Lemma 11 (p?, 1) -labeling constructed in the proof of Lemma 8 uses only rational numbers. 

Proof. By induction over the depth of T. Again, the lemma is evident for the tree with the 
root only. 

For other trees, notice that all qi can expressed by p and pt v Hence, q±, . . ., q s are 
rationals. Label of the root is the rational number p, labels of t±, . . . ,t s are rationals p — 
qi, . . . ,p — q s and labels of other nodes are (1 — p + qi)vj(t). (1 — p + q^ is a rational number 
because p and q^ are rationals and v]{t) is a rational number because z/* is a part of the 
(pTi, l)-labeling for a tree of smaller depth. □ 

Corollary 1 Let T be a tree. Then there is n G IN such that T has (p T n,n) -labeling with 
labels from IN. 

Proof. Let n be the least common denominator of all rational numbers in the (pt, l)-labeling 
ui, v 2 of Lemma 8. Then, nz/i(t), nz/ 2 (t) is a (prn, n) labeling and uses only natural numbers. 
□ 
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3.4 Universal diagonalization 

Let CP denote a sequence of j zeros and 0" denote an infinite sequence of zeros. Let K be the 
halting set, i.e. the set of all i such that program ipi halts on input i. Let K s be the set of 
all % such that ipi halts on input % in at most s steps. For a set S, let x,s be the characteristic 
function of S: Xs(i) — 1 if * £ S and — otherwise. 

Definition 7 [20] Let T be a tree of depth d. St is the set of all recursive functions f such 
that the sequence of values /(0), /(l), • • • is of the form 

i 1 ...i d tl a 1 t2 a 2 ...0 tl a l u ' 

where each t h = min{£ : \{j : ij G K t }\ > h} is finite, (ai, . . . , a\) G T , and either I = \{j : 
ij G K}\ or (a±, . . . , a[) is a leaf ofT. 

Lemma 12 [20] IfT has an (m,n) -labeling by integers then 

S T G [m,n\ PFIN. 

The next lemma is an extension of Kummer's results to probabilistic learning. The proof 
is similar to Theorem 16 in [20]. We give it here for completeness. 

Lemma 13 If St G (p)PFIN[0] and K is not Turing reducible to O, then T has a (p — e, 1) 
labeling for any e > 0. 

Proof. Let k be the depth of T. Let M denote an IIM that identifies St with the O-oracle. 
For arbitrary ii, . . . , we enumerate a set Tj^...^. 

Define the event P(c, s) to be true iff c = \{j : ij G K s }\ and, for each (a 1? . . . , a c ) G T 
and a c — i\. . .ik0 tl ai0 t2 a 2 . . . tc a c , the probability that M° outputs a program computing 
a function with an initial segment a c while reading cr c s is at least p — e. 

The procedure for enumerating T ilt ... jifc is as follows. 

Initialization. Let t = 0, c' = — 1, T ilt „. tik = 0. 

S'tep /. Search for the smallest s > t satisfying P(c, s) for some c > d . If the search 
terminates, enumerate (xK a (ii), ■ ■ ■ ,XK 3 (ik)) into T ilt .„ iik , set t = s, d = c and go to Step 
l + l. 

Claim 13.1 (xat(«i), • • • , e 

Proof. Let c = \{j : i 3 - G P(c,s) holds for all sufficiently large s because M° infers 

all functions o- c 0°°. After discovering it, (x*r(«i), . . . , Xic(ik)) = (XK s (ii), ■ ■ ■ , XK s (ik)) is 
enumerated into T ilv . . jifc . □ 

Claim 13.2 ... jifc | = k + 1 /or some ii, . . . ,i k . 
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Proof. If {xk{h), ■ ■ -,XK(ik)) G T iu ... tik and \T iu ... ;ik \ < k for all i u ...,i k , then, by Fact 6 
in [20], K is Turing-reducible to O. □ 

Hence, there exists ii, . . . ,ik and si < . . . < s^+i such that P(l — 1, si) for / = 1, . . . , k + 1. 
Define the label ^i(r) of r = (ai, . . . , aj_i) as the probability that: 

1. M does not output a program while reading <7;_ 2 Si - 2 , where a c — %\... i k tl ai0 t2 a 2 ■ ■ ■ 0* c a c , 
and 

2. M outputs a program computing a function with the initial segment 07 _i while reading 

tTj-iO"" 1 . 

For r = e, there is no segment <r_i and ^i(e) is just the probability that M° outputs a 
program computing a function with the initial segment <7o while reading O"o0 s °. 

The label v 2 [j) is for r = e and the probability that M° outputs a program computing 
a function with the initial segment o~i-i while reading ai_ 2 Sl - 2 for r = (ai, . . . , a;_i). 

Next, we verify that all conditions of Definition 6 are satisfied. Property 1 follows from 
the definitions of v\(e), v 2 (e) and P(0, s). 

For property 2, notice that v\{t) + v 2 (t) is the total probability that M° outputs a 
function consistent with o\^\ while reading <Ti_iO Si_1 . ^i(tj) are the probabilities that a 
particular continuation of o\-\ is an initial segment of the function. These events are mutually 
exclusive. Hence, YL\=\ vi{U) < vi{t)+v 2 {t). vi{ti)+V2(fi) > p— e is true because M° outputs 
a program consistent with <j;0 Si with a probability at least p — e (by the definition of P(c, s)). 

Property 3 is true because the sum of all z/x-labels on any branch is at most the probability 
that M° outputs a conjecture while reading akO Sk and, hence, is at most 1. □ 

If there is no oracle O, we get 

Corollary 2 If St £ (p)PFIN, then T has a (p — e, 1) labeling for any e > 0. 

Corollary 3 For a tree T, S T E (p T )PFIN and S T (£ (p T + e)PFIN for any e > 0. 

Proof. Corollary 1 and Lemma 12 imply that S T G [prn, n]PFIN for appropriate n. A 
{prn, n]PFIN team can be simulated by a (pt)PFIN probabilistic machine that chooses one 
of n machines in the team equiprobably. 

If St G (pt + e)PFIN, then, there is a (pt + e/2, 1) labeling of T (Corollary 2). This is 
impossible because pt is the largest number such that there is a (pt, 1) labeling of T. □ 

Theorem 4 Ifp, q G A and p ^ q, then (p)PFIN ^ (g)PFIN. 

Proof. Follows from Corollary 3 and Lemma 9. □ 
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3.5 Well-ordering and system of notations 

It remains to prove that, for any probability p, PFIN(p)-type learning is equivalent to PFIN- 
type learning with some probability belonging to A. Our diagonalization technique was 
similar to [20]. The simulation part is more complicated. Simulation techniques in [20] 
rely on fact that each team issues finitely many conjectures and, hence, there are finitely 
many possible behaviors of these conjectures. A probabilistic machine can issue infinitely 
many conjectures and these conjectures have infinitely many possible behaviors. This makes 
simulation far more complicated. 

We need an algorithmic structure for manipulating an infinite number of possibilities. 
We establish it by proving that A is well-ordered and has a system of notations. 

Theorem 5 The set A is well-ordered in decreasing ordering and has a system of notations. 

Proof. We construct a system of notations for the set A inductively. First, we construct 
a system of notations for A fl [|, 1]. Then we extend it, obtaining system of notations for 
Af\ [§, 1], An [\, 1] and so on. 



Freivalds[14] proved 

An 



-\ U { n \n G W&n > 1 



2 J L 2n - 1 



A system of notations for .An [|, 1] can be easily constructed from this description. Below, 
we show how to construct a system of notations for An [^j, 1] using a system of notations 
for [1,1]. 

An outline of our construction is as follows: 

1. Split the segment [^j, ^] into smaller segments [r i+ i,rj] so that, if p G [r i+ i,rj] and 
p G A follows from the rule 2, then p\ > r iy . . . ,p s > ri. (This property allows us 
to obtain a system of notations for An [rj + i,rj] from a given system of notations for 
A.n [rj, 1] without using any knowledge about A.n [r i+ i,rj].) We give the splitting and 
prove its properties in subsection 3.5.1. 

2. Using transfinite induction over the segments [rj+i,rj], extend the system of notations 
for A n [^, 1] to larger and larger segments A n [r i+ i, 1], finally obtaining a system of 
notations for A.n [^-, 1]. This part is described in subsections 3.5.2, 3.5.3, 3.5.4 and 
3.5.5. 

3.5.1 Splitting the segment [^-,^] 
The splitting consists of two steps. 

1. First, we take rr- for p G A. n [-, -M. By Lemma 6, all rr - belong to the set A. 

' l+p r L ?i' n— l 1 J ' 1+p ° 

These points split [^, into segments [^, j^]. 
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2. Each segment jt^] is split further by the sequence 



Ll+p' 1+rJ 

r 2 



= -r- — , ,r i+ i 



P n 

r o, fi, r 2 , . . . is a monotonically decreasing sequence converging to y^. It splits [y^, y^] 
into segments [ri,r ], [72, ri], • • •• 

Let A n denote the set consisting of all and r ,ri, ... for all segments [j^, Next, 
we prove several properties of the segments [r i+1 ,rj] that will be used further. 



Lemma 14 Let [-^, j^] be a segment obtained in the first step of the splitting. If x G A 

i by th 

Pi <P,P2 <P,...,Ps<P- 



follows from p±, . . . ,p s e A by the rule 2 and x e [j^, then 



Proof. We have 



Pi = i —TT < 



1 — x + qj 1 — x + 1 — x ' 

— x) < x, 

Pj < ar(l+pj), 
P? 

— < x. 
l+Pj 

Therefore, < ^ and < p. □ 

Lemma 15 Let x & An [r i+1 , r,] . If x e A follows from pi, . . . ,p s E A by the rule 2, then 

Pi > n,p 2 >Ti,...,p a > ri. 

Proof. We prove p\ > ri only. (p 2 > r i} . . . are proved similarly.) 

Assume that [r i+1 , r^ was obtained by splitting [-^, Then, p 1 < p, p 2 < p, . . . ,p s < 
p (Lemma 14). 

From 

x 

1-x + qj ~ Vl 

it follows that 

X 

q-i — 1 + x. 

Pi 

We have p 2 < p. Hence, 

x 

q 2 > 1 + x, 

P 
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X 

qi<x-q 2 <l — , 
P 

x x 

Pi = ~, — > « = x = 2 



l-x + q 1 -2-x-£ A -i- I 

1 ^ p x p 

From x G [r i+1 , rj] we have that x > r i+1 and 

11 1 
Pl ~ 2-1-1 " _2__i_ I " (1 + 1 + I) - r *- 

a; p rj_|_i p \ p' P 

□ 

We have proved that all x E A Pl [r i+ i,rj] are generated by applications of the rule 2 to 
Pi, . . . ,p s E An [ri,l]. The next lemma bounds the number s. 

Lemma 16 Letx e ^4n[rj+i, r*], [r i+ i,rj] ftemj a segment obtained by splitting [-j^, 
If x E A follows from p±, . . . ,p s & A by the rule 2, then 

x 

s < 



z + x- 1 

p 



Proof. From Lemma 14 we have 



g, = V x — 1 > — h x — 1. 

Pi P 



Hence, 



x = 5^q J ->s(- + x- l), 

J'=l 



s < 



vP 
X 



- £ +x - 1 
p 

□ 



3.5.2 Well-ordering 
Lemma 17 A n is well-ordered. 

Proof. .An [^, 1] is well-ordered by inductive assumption. Hence, A D [^, ^-j-] is well- 
ordered, too. The set {y+^Ip e ^ n [„' ^ll) * s order-isomorphic to A fl [^, ^-j-]. Hence, it 
is well-ordered and the set of segments [^, ^_] into which it splits [^-j-, ^] is well-ordered, 
too. 

A n is obtained by replacing each segment [y^-, j^} with the sequence r ,ri, . . .. Each 
sequence is well-ordered. Hence, the entire set A n is well-ordered. □ 
Hence, we can use transfinite induction over this set. 
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Lemma 18 An [y^y, yyj is well-ordered in decreasing ordering. 

Proof. By transfinite induction over A n . 
Base case. The set An [^, 1] is well-ordered. 

Inductive case. Let x G A n . We assume that .An [x', 1] is well-ordered for all 
and prove that A fl [x, 1] is well-ordered, too. There are three cases: 

1. x = tt- for p E An [— 7T, -1 and p is a limit element. 

Let p be the limit of Pi,P2, Then, y^ is the limit of y+^y, 1+^' ■ ■ ■ because the 

function y^ is continuous. By inductive assumption, each [y^y-, 1] is well-ordered. 
Hence, their union [^, 1] is well-ordered. 

2- # = y+y, for p G A fl [yy^y, yy] and p is not a limit element. 

We take the segment [y^, obtained in the first step of the splitting and the cor- 
responding sequence r ,ri,.... is the limit of r ,ri,.... [y^, 1] is well-ordered 
because each [r^, 1] is well-ordered. 



3- x ^ r*- for any p G A n [yy^y, J]. Then, x ^ r because r = j- for r G A n [yy^y, J]. 



Hence, x = 7^+1 for some i > 0. 

An[rj, 1] is well-ordered because rj+i < r^. Hence, it is enough to prove that An [r i+ i, r^] 
is well-ordered. 

For a contradiction, assume that A fl [rj+i,rj] contains an infinite monotonically in- 
creasing sequence xi, X2, 

Claim 18.1 Let x\ G A fl [rj+i,rj] ; x 2 G A n [rj+i,rj], .... There is an s G IN and 
sequences x[, x' 2 , ■ ■ ■ and Pj t i,pj >2 , ■ ■ ■ for j G {1, . . . , s} such that 

(a) x[,x' 2 , . . . is a subsequence of x±,X2, ■ ■ ., 

(b) x' k G A follows from pi^, ■ ■ ■ ,p s ,k G A and the rule 2, and 

(c) Pj,i = Pj,2 = ■■■ or p j:1 > p j:2 > ... for all j G {1, . . . , s}. 

Proof. Denote 



Consider the applications of the rule 2 that prove r± G A, r 2 G A, . . .. By Lemma 16, 



x 



si = 




- 1 



X 



s < 



— x 



2 + x- 1 - -2- 

V r i+1 



< Si 



+ X-1 
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in each of these applications. Hence, there exists an s £ {1, • • • , s i} such that infinitely 
many of xi, . . . are generated by applications of the rule 2 with s = s . We denote this 
subsequence , x 2 , 

Next, we select x^\ x 2 \ . . ., a subsequence of xf\ x 2 \ • • •• Then, we select x^\x 2 \ ■ ■ ., 
a subsequence of x^\ x 2 \ .... We continue so until we obtain Xi°\x 2 °\ 

The subsequence x^\x^\ ... is generated from x^ k_1 \ X2 _1 \ ... as follows: 

Let p[ k j *\ . . . iP'sq^ be the values of pi, . . . ,p So in the application of the rule 2 that 
proves Xj k ~^ G A. We use the infinite version of Dilworth's lemma. 

Theorem 6 Let yi,y 2 , ■ ■ ■ be a sequence of real numbers. Then y 1; y 2 , ... contains 

• a subsequence y ni ,y ri2 , . . . such that y ni — y n2 — . . ., or 

• an infinite monotonically increasing subsequence, or 

• an infinite monotonically decreasing subsequence. 

The sequence p^ x 1 \p[. fc 2 X \ ■ ■ ■ does not contain an infinite monotonically increasing 
subsequence because all elements of this sequence belong to A H [n, 1] and A H [r«, 1] 
is well-ordered in decreasing ordering. Hence, this sequence contains an infinite sub- 
sequence consisting of equal elements or an infinite monotonically decreasing subse- 
quence. 

Let this subsequence be p^nl\ Pknl\ We choose v^~ 1 \r^' 1 \ ... as the sequence 

x l i- h 2 i ■ ■ ■■ 

x[ s °\x 2 °\ ... is the needed sequence x[, x' 2 , We have 

Pi,k = P2,k = ■■■ or p 1>fc > p %k > • • • 

because such property holds for the sequence x±\ x 2 k \ ■ ■ ■ and x-^ , x 2 , ... is a 
subsequence of x[ k \ x 2 k \ .... □ 

We have 

Pi,i > P2,l > ■■■ 

Pl,s > P2,s > .... 

By Lemma 4, 

X .... 

Hence, Xi,x 2 , . . . contains an infinite non- increasing subsequence. □ 

This is a contradiction with the assumption that x±, x 2 , . . . is monotonically increasing. 
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□ 

Next, we construct a system of notations S for A H [^j, We start with technical 
results necessary for our construction. In section 3.5.3, we show how to distinguish limit 
elements from successor elements. In section 3.5.4, we define (x, <i)-minimal sets and show 
that such sets can be computed algorithmically. Finally, in section 3.5.5, we use these results 
to construct a system of notations. 



3.5.3 Distinguishing elements of different types 

The maximal element of the set A is 1. It does not belong to „4n[r i+ i, r^. Hence, ^4fl[r i+1 , r,] 
does not contain the maximal element and, constructing a system of notations, we should 
distinguish numbers p of three types: 

1. p £ Ad [r i+ i,rj] and p is a successor. Then k s {p) = 1. 

2. p e Ad [rj + i,Tj] and p is a limit element. Then ks(p) = 2. 

3. p An [r i+1 ,ri\. Then k s {p) = 3. 

Two lemmas below shows how to distinguish between limit and successor elements. 

Lemma 19 Let x e An [r i+ i, r,] . T/ien x is a limit element if and only if it can be generated 
by rule 2 so that at least one o/pi, . . . ,p s is limit element. 

Proof. 

"if" part. Assume that pj is a limit element. Let Pj t i,Pj t2 , ... be a monotonically decreas- 
ing sequence converging to pj and x^ be the number generated by the application of the rule 
2 to pi, . . ., Pj-i, Pj t k, Pj+i, ■ ■ •, Ps- Then, x 1 ,x 2 , ... is a monotonically decreasing sequence 
converging to x. Hence, a; is a limit element. 

"only if" part. Let x be a limit element and xi,x 2 , ■ ■ ■ be a monotonically decreasing 
sequence converging to x. We apply Claim 18.1 to xi,X2,-- - and obtain a subsequence 
Xi, x 2 , .... 

We consider the sequences Pj,i,Pj,2, • • •• Let 

p> = Mm Pj , k . 

By Lemma 5, x can be generated from p[,p' 2 , ■ ■ ■ ,p' s by an application of rule 2. We have 

Pj,i = Pj,2 = ■■■ or Pjil > p ji2 > .... 

for any j G {1, . . . , m}. If Pj tl = Pj j2 = ... for all j, then, x[ — x 2 — A contradiction 

with the assumption that Xi,x 2 , . . . is monotonically decreasing. 
Hence, 

Pj,l > Pj,2 > ■■■ 

for at least one j and p'j = lim^oo Pj,k is a limit element. □ 
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Lemma 20 Let x G A n . Then x is a limit element. 
Proof. We have three cases. 

1- x — 1+^ for p G A n [^j-j-, ^] and p is a limit element. 

Let p be the limit of pi,P2, Then, ^ is the limit of j^-, • • • because the 

function y^ is continuous. 

2. x = T7- for p G A fl f-rr, -1 and p is not a limit element. 

l+p r Ln+l ' jjJ ^ 

We take the segment [y^, y^] obtained in the first step of the splitting and the cor- 
responding sequence r , ri, . . .. is the limit of r , ri, . . .. 

3 . x ^ for ^ p ei 4 n [_!_,!]. 

Then, x = rj. We prove the lemma by induction over i. 

Base Case. If « = 0, then rj = and we already know that y^ is a limit element. 

Inductive Case. Lemma 2 and the definition of rj+i imply that r*+i G A follows from 
ri & A and p G A by the rule 2. If rj is a limit element, then, by Lemma 19, r i+ i is a 
limit element, too. 



3.5.4 (x, (i)-minimal sets 

In the algorithms of subsection 3.5.5, we will often need to compute the largest element of 
A H [rj + i, rj] which is less than some given x. This will be done by checking pi G A H [rj, 1], 
P2 G A fl [rj, 1], . . ., p s G A fl [rj, 1] that can generate rr G .4 by rule 2. There are infinitely 
many possible combinations of p±, . . ., p s . Hence, we need 

• to prove that it is enough to check finitely many combinations pi G A D [rj, 1], P2 G 
.An [rj,l], p s G An [rj,l], and 

• to construct an algorithm finding the list of combinations p\ G An [rj, 1], p-i G An [rj, 1], 
. . ., p s G An [rj, 1] which must be checked when the functions ks,Ps, <?s are computed. 

We do it below. First, we give formal definitions. 

Definition 8 A tuple (pi,...,p s ) is said to be (x,d)- allowed if p\ G An [rj, 1], p s G 
An[rj,l] andT, s j= i(f: + x-l) < d. 

Definition 9 A tuple (pi, . . . ,p s ) is said to be less than or equal to (p^, . . . ,p' s ) if pi < p[, 

■■;Ps< Ps- 
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Definition 10 A set of tuples P is said to be (x,d) -minimal if, 

1. It contains only (x,d)- allowed tuples; 

2. For each (x,d)- allowed tuple (p±, . . . ,p s ) there is a tuple belonging to P which is less 
than or equal to (pi, ■ ■ ■ ,p s ) 

Next three lemmas show why (x, d)-allowed tuples and (x, <i)-minimal sets are important 
for our construction. 

Lemma 21 (pi,...,p s ) is (x,x)-allowed if and only if the application of the rule 2 to 
Pi, . . . ,p s generates a number p satisfying p > x. 

Proof. Let d = X^ =1 (x + j- — 1). (pi, • • ■ ,p s ) is (x, x)-allowed if and only if d < x. Hence, 
it is enough to prove that d < x if and only if x < p. 

d=±(x + --l)=±(x + --l) -p + p 
f^i\ Pj J Pj J 



s 



We have 

£(i + I)>i + i>i. 

jrA Pj) Pj 

Hence, if x > p,then (x — p) > and d > (x — p) + p = x. If x < p, then (x — p) < and 
d < (x — p) + p = x. □ 

Lemma 22 Let P be a (x,x) -minimal set. Then, for any pi, . . . ,p s that generates p > x 
by an application of the rule 2, there exists a tuple (p[, . . . ,p' s ) e S such that p[ < pi, . . ., 

Ps < Ps- 

Proof. By Lemma 21, (pi, . . . ,p s ) is (x, x)-allowed. By the definition of (x, x)-minimal set, 
P contains a tuple (p[, . . . ,p' s ) such that p\ <p\, . . ., p' s < p s . □ 

Lemma 23 Let P be a (x , x) -minimal set, p\ e An [n, 1], . . . ,p s G An [n, 1]. If x e A 
follows from p±, . . . ,p s e A and the rule 2, then (pi, ■ ■ ■ ,p s ) G P. 

Proof. By Lemma 21, {pi,---,p s ) is (x, x)-allowed. Hence, by Lemma 22, there exists 
(x, x)-allowed {p[, . . . ,p' s ) G P such that p[ < pi, . . . ,p' s < p s . 

Let x' be the number generated by an application of the rule 2 to p[ G A, . . ., p' s G A. 
If p'j < pj for some i, then x' < x (Lemma 4) and (p[, . . . ,p' s ) is not (x, x)-allowed (Lemma 
21). 
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However, (x, x)-allowed set contains only (x, x)-allowed tuples. Hence, p\ = p[, . . . ,p s = 

P' s , i-e- ipi,...,p s ) e P. □ 

Next lemma shows that (x, d)-minimal sets can be computed algorithmically. Its proof 
also shows that a finite (x, d)-minimal set always exists. 

Lemma 24 Assume that a system of notations for An [n, 1] is given. There is an algorithm 
xdminimal (x,d) which receives x e An [r i+ i,rj] and d e [0, x] and returns a (x,d)-minimal 
set. 

Proof. We use an auxiliary procedure findsmallest(P,x,d). It receives numbers x, d and 
an (x, d)-minimal set P and returns the smallest d! such that d! > d and Ei=i(J~ + x — 1) = ^' 
for some p±, . . . ,p s E A. 

Both findsmallest and xdminimal use a constant p - Po is defined as the largest number 
in A fl [r-j, 1] such that x + ^ — 1 > 0. Equivalently, p is the number in A fl [r i? 1] with the 
smallest x + — — 1 such that x + — — 1 > 0. A denotes — + x — 1. 

PO PO po 

Algorithm findsmallest(P, x, d): 

1. Let d' = 1; 

2. For each {pi, . . . ,p s ) e P do: 

(a) For each j e {1, . . . , s} : 

i. Find p'j = m&x{p\p G A fl [r,, 1] and p < p^}, using the given system of 
notations for A(~) [r i: 1]. 

h- di = EU(^ + x-l) + (g + x-l) + £jL;+i(£ + x-l).1id 1 >d, then 

d' = min(d', di). 

(b) d 2 = Ej s =i( J + a; - 1) + {f Q + x - 1); If d 2 > d, then d! = min(d', d 2 ). 

3. Return d' as the result; 
Algorithm xdminimal (x, d) 

1. Let P = 0; 

2. If d < A, return the empty set as the result; 

3. Let y be the smallest number in A fl [r i: 1] such that | + x — 1 < d. 

4. while (| + x - 1 > 0) do: 

(a) d' = d-(f + x-l); 

(b) Pi = xdminimal (x, d'); 
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(c) If Pi = 0, add (y) to P. Otherwise, for each (pi, . . . ,p s ) G Pi, add (y,pi, . . . ,p s ) 
to P; 

(d) Replace y by a greater element of A fl [n, 1]: 

i. If y is a successor element, replace y by ps 1 (y), using the given system of 
notations for AC\ [r,, 1]; 

ii. If y is a limit element, replace y by y' where y' is the smallest element of 
An [rj, 1] such that 

x 

h x — 1 < d — findsmallest(Pi,x, d'). 

y 

5. Return P. 

Proof of correctness for xdminimal(x, d). We prove the correctness by induction over |_^J- 
Base Case, d G [0, A[. 

Then, | + x - 1 > A for any y. Hence, Ej=i(^r + x - 1) > A for any (pi, . . . ,p s ) and 
there are no (x, d)-allowed tuples. In this case, the algorithm returns the empty set. Hence, 
it works correctly. 

Inductive Case. We assume that the lemma holds for d G [0, kA[ and prove it for d G 
[kA, (k + 1) A[. We use 

Claim 24.1 If xdminimal (x,d) calls xdminimal(x,d'), then d' < d — A 

Proof. From the description of xdminimal we have d' — d — (- + x — 1). By definition of 
p and A, | + x - 1 > A and d! < d - A. □ 

Hence, xdminimal(x, d) calls only xdminimal(x, d') with d! < (k + 1)A — A = /cA. The 
correctness xdminimal(x, d') for such values follows from the inductive assumption. 

First, we prove that the computation of xdminimal(x, d) always terminates. Each 
xdminimal(x,d') called by xdminimal (x,d) terminates because xdminimal(x,d') is cor- 
rect. Hence, each while loop terminates and, if xdminimal (x,d) does not stop then while 
loop is executed infinitely many times. 

Let yj be the value of y during the j th -th execution of while loop, y is increased at the 
end of each while loop. Hence, y\ < yi < . . .. 

y± G A fl [r^ l],y 2 Gw4.Pl [r^, 1],.... If while loop is executed infinitely many times, 
then yi, y 2 . . . is an infinite monotonically increasing sequence. However, A fl [r^, 1] does not 
contain such sequences because it is well-ordered. 

Hence, while loop is executed finitely many times and xdminimal(x, d) terminates. Let 
P = xdminimal (x, d). Next, we prove that P is a (x, <i)-minimal set. 

For a contradiction, assume that it is not. Then, there exists an (x, d)-allowed tuple 
(pi, . . . ,p s ) such that P does not contain any tuple that is less than or equal to (pi, . . . ,p s ). 

We assume that (p[,P2, ■ ■ ■ ,Ps) is n °t (x, cf)-allowed for any p[ G A fl [n, 1] satisfying 
p'i < pi. (Otherwise, we can replace pi by the smallest p[ G ^4n[rj, 1] such that (p'i,P2, • • • ,p s ) 
is (x, d)-allowed.) 
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Consider two cases: 

1. In xdminimal(x, d), while loop is executed with y —p\. 

Denote d' — d — (-^- + x — 1). The tuple {p 2 , • • • ,p s ) is { x i d')-al\owed. 

xdminimal(x,d) calls xdminimal(x, d'). xdminimal(x,d') works correctly, i.e. returns 
an (x, d^-minimal set P\. Hence, Pi contains a tuple (p f 2 , . . . ,p' s ) that is less than or 
equal to (p 2 ,...,p s ). 

xdminimal(x, d) adds (pi,p' 2 , ■ ■ ■ ,p' s ) to P because (p' 2 , . . . ,p' s ) belongs to the set re- 
turned by xdminimal(x,d'). Hence, P contains the tuple {pi,p 2 , ■ ■ ■ ,p' s ) that is less 
than or equal to (pi,P2, • • • ,Ps)- A contradiction. 

2. While loop is not executed with y = p\. 

Let yi be the greatest number such that y\ < p\ and while loop is executed with y — y 1 . 
Let y 2 be the number by which y 2 is replaced in the end of while loop. 

yi < y-i because y is always replaced by a greater number. By definition of yi, y2 > Pi- 
(Otherwise 2/2 would have been instead of y\.) 

(a) yi is a successor element. 

Then, y 1 , y 2 , p 1 all belong to A and y 1 < p 1 < y 2 . When xdminimal(x,d) 
replaces yi by a greater element of A, it chooses the smallest element of A that 
is greater than y±. It can be p\ or some number between y\ and p\ but not y<i- A 
contradiction. 

(b) yi is a limit element. 

We assumed that (p[,P2, ■ ■ ■ ,p s ) is n °t (x, d) -allowed for any p' x e A H [r^, 1] 
satisfying < pj. Hence, {y±,P2, ■ ■ ■ ,p s ) is not (x, c?)-allowed i.e. 
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because (pi,P2, • • • ,p s ) is (x, d)-allowed. Hence, 



Pi 



x 



+ x — 1 < d — \ \- x — 1 < d — findsmallest(Pi,x, d'). 

P2 \Pj J 




By the definition, y 2 is the smallest number such that 



x 



+ x — 1 < d — findsmallest(Pi,x, d'). 



This implies y 2 < P\- A contradiction with y 2 > Pi- 



□ 



3.5.5 System of notations 

We show how to extend a system of notations S from A H 1] to A H [^zj, 1]- Below, we 
give the algorithms computing k s (x), ps(x) and qs{x) for x G [^;, These algorithms use 
the procedure xdminimal(x, d) defined in the previous subsection. They also use the system 



Function ks(x). 

1. Use the system for A D [^, ^j-] to find whether x = -J— for some p G A fl [^, ^zj;]- If 
yes, then &s(x) = 2. 

2. Otherwise, find the segments [j^, and [r i+ i, r*] containing x. If x = r i+1 or x = 
then ks(x) = 2. 

3. Otherwise, find an (x, x)-minimal set P using xdminimal(x,x). 

4. If there exists (pi, . . . ,p s ) G P such that a; is generated by an application of the rule 2 
to pi, . . . ,p s and at least one of pi, . . . ,p s is a limit element, then ks(x) = 2. 

5. Otherwise, if there exists (pi, . . . ,p s ) G P such that x is generated by an application 
of the rule 2 to pi, . . . ,p s , then ks(x) = 1. 

6. Otherwise, &s(x) = 3. 
Function ps(x). 

1. Find the interval [r i+ i,ri] containing x. Execute xdminimal(x,x) and find a (x, x)- 
minimal set. 



5for^n[i,l]. 



2. Let Pi be the set consisting of all tuples (pi, . . . ,p s ) such that 



(a) (p 1 ,...,p s ) G P or 
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(b) (p 1 , . . . • • • , p s ) £ P and = ps(pj) for some j £ {1, . . . , s} or 

(c) {pi, . . . ,p j _ 1 ,p',p j+1 , . . . ,p s ) £ P for some j £ {1, . . . , s} and p' £ .An [n,l]. 

3. For each tuple (px, . . . ,p s ) £ Pl find the number p £ A. generated by an application of 
the rule 2 to pi, . . . ,p s . 

Ps(x) is the smallest of those p which are greater than x. 
Function qs(x). 

1. If x — p £ A n [i, and p is a limit element, qs(x) is a program computing 

y gs (p)(°) v\g( P )(i) 
i+v, s (p)(o)' i+^ s(p) (i)' • • - 

2. If rr = p £ A. n ^-j-] and p is a successor element, find r = ps(p). ?s(x) is a 
program computing the sequence r , n, ... corresponding to [y^, 

3. Otherwise, search the set P returned by xdminimal(x, x) and find pi £ A.n [r,, 1], . . ., 
p s £ A n [rj, 1] such that x is generated by an application of the rule 2 to pi, . . . ,p s 
and pj is a limit element. 

gs(x) is a program computing the sequence x±, X2, ... where x^ is generated by an 
application of the rule 2 to pi, . . ., p^i, V?g Sl fo)(fc)> Pj'+i' 

Lemma 25 S is a system of notations for A n [^j, 1]. 

Proof. By transfinite induction over A n . 

.Base Case. 5 is a correct system of notations for A.n [^, 1]. 

Inductive Case. Let y £ A n . We assume that S is correct for all A.n [y', 1] with y' £ A n and 
y' > y and prove that it is correct for A n [y, 1]. We consider two cases: 

1. y = T ^andp£^ln[i,^ T ]. 

Similarly to the proof of Lemma 20, y is a limit of a sequence consisting of elements 
of A n . Hence, if x > y, then x > y' where y' is some element of this sequence. The 
functions ks(x), ps(x), qs( x ) are correct because S is correct for A.n [y', 1] (by inductive 
assumption). It remains to prove the correctness of ks(x), ps(x), qs( x ) f° r x = y. 

ks(y) = 2. This is correct because, by Lemma 20, y is a limit element. The function 
Ps(x) is defined only for successor elements. Hence, we do not need to check its 
correctness for the limit element y. The correctness of the sequence computed by qs(y) 
is proved in the proof of Lemma 20. 

2. y = r i+ i for i > 0. In this case, we assume that S is correct for A n [r^, 1] and prove 
the correctness for A.n [r i+ i,rj]. 

By Lemma 24, xdminimal(x, d) returns an (x, d)-minimal set if it has access to a 
system of notations for A.n [r^, 1]. We know that S is correct for A.n [n, 1]. Hence, the 
set P returned by xdminimal(x,x) is (x, x)-minimal. 
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2.1. Proof of correctness for k s . 

If x E An [rj+i, rj], then x <E A follows from pi G A, . . . ,p s G A and the rule 2, 
for some pi, . . . , p s . By Lemma 15, p\ G An [r,, 1], . . . ,p s G -4.fi [r^, 1]. By Lemma 
23, (pi, . . . ,p s ) belongs to P. 

Correctness of xdmininal(x,x) implies that, if x G A, the algorithm computing 
ks finds pi, . . . ,p s such that p E A follows from p x G A, . . . ,p s G A and the rule 
2. 

Hence, it distinguishes x E A and x ^ .4 correctly. By Lemma 19, it distinguishes 
limit and successor elements correctly. 

2.2. Proof of correctness for ps- 

We prove that ps(x) returns the element of A fl [r i+ i,rj] immediately preceding 
x i.e. (Wz G A fl [n+i, rj])(x < z =^ps(x) < z). 

Let £ G .4 fl [n+i, r*] and x < Consider pi, . . . ,p s that generate z G ^4 by rule 
2. 

P contains a tuple (pi, . . . ,p' s ) such that pi < pi, . . . ,p f s < p s (Lemma 22). An 
application of the rule 2 to p[, . . . ,p' s generates p G A with p > x (Lemma 21). 
Consider two cases: 

(a) p > x. 

The algorithm computing ps adds (p[, . . . ,p' s ) to the set Pi. Later, it sets 
Ps(x) equal to a number that is less or equal to p. (This is true because 
(p'li ■ ■ ■ iP's) e -^1 an d Pi, ■ ■ ■ iV's generates p > x. The algorithm selects ps{x) 
as the smallest of all p satisfying these conditions.) 
By Lemma 4, p < z. Hence, ps(x) < p < z. 

(b) p = x 

If Pi = p[, p s = p' s then p = However, p < z. Hence, pj < p'j for some 
i. Let p" = Ps(p'j)- We have p" < pj because Psip'j) is the smallest element 
of A that is greater than p'-. Let p denote the number generated by the rule 
2 from pi, . . . ,p^_ 1? pj, pj. +1 , . . . ,p' s . 

By Lemma 4, x < p. Hence, the algorithm for ps{x) adds (pi, . . . ,Pj_i, Pj, 
Pj + i, • • • ,p' s ) to the set Pi and, then, checking tuples in Pi, sets ps{x) equal 
to a number which is greater than or equal to p. This implies ps{x) < p. 
From pi < pi, . . ., pi_ : < pj_i, p^ < p j; pi +1 < pj + i, . . ., p' s < p s it follows 
that p < z (Lemma 4). Hence, ps(x) < p < z. 

So, in both cases ps(x) is less than or equal to any z G A satisfying x < z. On 
the other hand, ps(x) G A and x < ps(x). (It can be seen from the algorithm 
computing ps-) 

Hence, ps(x) is the smallest element of A satisfying x < ps(x), i.e. the algorithm 
computes ps correctly. 

2.3. Proof of correctness for qs- 
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We already proved that, if there exist p±, . . . ,p s such that x G A follows from 
Pi G A, ■ ■ ■ , p s G A, then such combination is found by xdminimal(x, x) (see proof 
of correctness for kg). If there exists such a combination with one of pi, . . . ,p s 
being limit element, it is found. The algorithm computing q$ generates a program 
computing required sequence from such combination correctly. 



The correctness of S for AC\ [-, 1] follows by transfinite induction. □ 

By Lemmas 18 and 25, A H [^, 1] is well-ordered and has a system of notations for any 
n. Hence, A is well-ordered and has a system of notations. This completes the proof of 
Theorem 5. □ 

3.6 Universal simulation 

Theorem 7 For any p E A there exists k such that PFIN(x) C [pk, /c]PFIN for all x which 
are greater than any p' G A H [0,p[. There exists an algorithm which receives a probabilistic 
machine M and a probability x and outputs a team L ± , . . . , L k which identifies the same set 
of functions. 

Proof. By transfinite induction. 

Base Case. For p > |, the theorem follows from the results of [14]. 

Inductive Case. We assume that the theorem is true for all p G A such that p > po and 
prove it for p = p . 

Let p' be the largest element of A for which ^- + x — 1 > 0. is always a successor 
element. (If it was a limit element, let q±, q 2 , ... be a decreasing sequence that converges to 
p' . For some element qi in this sequence, ^ + x — 1 > 0, implying that p' is not the largest 
element with this property.) Let p'^ be the predecessor of p' . 

Let P be a (x, x)-minimal set (see Section 3.5.4). Let P' be the set of all p that appear 
in some tuple in the set P. 

We define two functions g(r) and g'(r), for r G [0,x\. To define g(r), let y be the 
smallest element of A which is at least 1+ ^_ r ■ If y — p", we define g(r) = 0. Otherwise, 
let y' be the largest element of P' satisfying y' < y. Let g{r) be the solution to y' = 
i_ P o+ g ( r ) • ( Equivalent ly, g(r) = p + |j — 1-) To define g'(r), let S(r) be the set of all tuples 
(ri, r 2 , . . . , r m ) such that ri + . . . + r m < r, r l > 0, . . . , r m > 0. Then, 



In the simulation algorithm for p = p , we use several simulation algorithms for p > p 
as subroutines. Namely, we use: 

1. A simulation algorithm for p = p'^. 

2. Simulation algorithms for all p G P' . 




sup 

{r 1 ,r 2 ,...,r m 



g(ri) +g(r 2 ) + ■ ■ ■ + g{r m ). 



>GS(r) 
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The existence of these simulation algorithms is implied by the assumption that Theorem 7 
holds for p > p . 

A [pk, /c]PFIN-team L = {L±, . . . , L^} simulates a probabilistic PFIN(a;)-machine M as 
follows: 

1. Li,...,Lfc read /(0), /(l), • ■, simulate M and wait until the probability that M 
has issued a conjecture reaches x. Then pfc machines (L 1; . . . , L pfc ) issue conjectures 
hi, ... , h p k- 

2. The first values of the functions computed by hi, . . ., h p k are identical to the values of 
/, i.e. 

<p hl (i) = ... =Vh pk (i) = /(*) 

for i < m where f(m) is the last value of / read by L before issuing conjectures. The 
next values of these functions are computed as follows: 

Let n — m + 1. Let T = {(/(0), /(l), • • • , /(m))}. We repeat the following sequence 
of operations. For each segment p = (/(0), . . . , /(ra — 1)) in T: 

(a) Find all conjectures of M (among ones issued until the probability reached x) that 
output /(0), . . . , f(n — l). Run each of those conjectures on input n. Let d\, . . . ,d s 
be the values that are output by at least one of conjectures. For % e {1, . . . , s), 
let r\ be the total probability of M's conjectures outputting /(n) = dj. The 
programs hi, ... , h p k output the next value as follows. Out of those programs, 
which have output the segment (/(0), . . . , f(n — 1)), g'(ri)k output f(n) = di, 
g'(r 2 )k output f(n) = d 2 and so on. If the number of programs that have output 
the segment (/(0), . . . , f(n— 1)) is larger than g'(r 1 )k + g'(r 2 )k + . . . + g'(r s )k, the 
remaining programs are not necessary for the further steps. Make them output 
f(n) = f(n + 1) = . . . = 0, to ensure that every program computes a total 
function. 

(b) The programs which have output f(n) = di then simulate the machine M on 
input f(0),...,f(n). If the total probability of M issuing a conjecture consistent 
with /(0), . . . , f(n) reaches x, invoke the simulation algorithm for simulating a 
probabilistic machine with the success probability p = 1 _^ +r by a team of (1— po + 
g(ri))k machines, pok of which have to be successful. Let g(ri)k of g'(ri)k programs 
which have output /(0), . . . , f(n) simulate the simulate the first g(ri)k machines in 
this simulation. If g'{rj)k > g(ri)k, make the remaining g'{rj)k — g{ri)k programs 
output f{n + 1) = f (n + 2) = . . . = 0. 

(c) Otherwise (if the probability of M issuing a conjecture consistent with /(0), . . . , f(n) 
does not reach x), add (/(0), . . . , f(n)) to the set T. 

After the previous three steps have been done for every segment (/(0), . . . , f(n — 1)) G 
T, increase n by 1 and repeat. 
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3. After Li, . . . , L Po k have issued conjectures, all remaining machines in the team L read 
the next values of the input function and simulate the conjectures issued by the prob- 
abilistic machine M before conjectures of Li, . . . ,L po k. They wait until the step 2b 
happens, for a segment /(0), . . . , f(n) consistent with the input. Then, L po k+i, . . ., 
Lk (i.e. all machines which have not issued conjectures yet) participate in one of two 
simulations: 

(a) If g(ri) > 0, they, together with g(ri)k of programs hi, ... , h po k, form an [pok, (1 — 
Po + g{ r i))k] team. This team simulates a probabilistic machine M' according to 
the algorithm for p = yz^j^-y- (Note that, by the definition of g, p G P' .) 

The machine M' is defined as follows. It reads /(0), . . ., f(n) and then simulates 
M. If, while reading /(0), . . ., f{n), M outputs a conjecture inconsistent with 
the segment /(0), . . ., f(n), M' restarts the simulation of M. If M outputs a 
conjecture consistent with /(0), . . ., f(n), M' outputs this conjecture as well. If 
M outputs no conjecture while reading /(0), . . ., fin), M' proceeds to read the 
next values of / and keeps simulating M. 

(b) If g{j-i) = 0, they form an [pok, (1 — po)k] team and simulate the probabilistic 
machine M' , defined as above, according to the algorithm for p = p'q. 

Proof of correctness. We need to show two statements. 

• When we use an [pok, (1 — po + g(ri))k] or an [pok, (1 — po)k] team to simulate a 
probabilistic machine, the team is able to perform the simulation. 

• For a segment (/(0), /(l), • • • , f(n — 1)) G T, the sum of numbers g'(ri)k of programs 
hi, ... , h p k asked to output various extensions (/(0), /(l), • • • , /(n — 1), /(ra)) of this 
segment is never more than the number of program which have output the segment 
</(0),/(l),...,/(n-l)). 

We start with the first statement. We consider two cases. 

I. Case 3a. 

Here, we use g{ri)k programs and (1 —po)k machines L Po k + i, . . ., to simulate M on 
functions with the given f(0),...,f(n) according to the algorithm for p = p\. 

The success probability for M' is at least 1 _^ +r . , since M succeeds with probability at 
least x and the probability that M outputs a conjecture inconsistent with /(0), . . . , f(n) 
is x — rj. Let y and y' be as in the definition of g(ri). Then, 1 _^ +r is greater than 
any p' G A fl [0, y[. Since y > y' , x _^ +r . is also greater than any p' G A fl [0,y'[. 
Therefore, by inductive assumption, it can be simulated by an [y'k' , k'\ team, for some 
k'. Since y' = j^^^ry , a [p fc, (1 — Po + fl'(n))^] team can do this task, as long as k 
is appropriately chosen. 
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2. Case 3b. 

Here, we use (1 — p )k machines L pok+1 , . . ., L k to simulate M on functions with the 
given /(0), . . . , f(n) according to the algorithm for p = p'^. 

The success probability for M' is at least 1 _^ +r . , by the same argument as before. 
Since g(ri) = 0, this is more than any p' e A n [0,Po[- By inductive assumption, this 
means M' can be simulated by an [p'^'k' , k'] team, for some k'. We will choose k so 
that k' — (1 — Po)^- Then, M' can be simulated by an [pd(l — Po)^> (1 — Po)^] team. 
It remains to prove that this simulation yields at least p k correct programs. This is 
equivalent to Pq(1 — po)k > pok. 

If Po < i^^> then Y^y7 would belong to the interval [x,po[, contradicting the assump- 
tion that this interval does not contain any elements of A. Therefore, p'q(1 — p ) > p 
and p'q(1 - p )k > p k. 

Next, we show that the programs output by L±, . . ., L pk are sufficient to conduct the 
necessary simulations. Let (/(0), . . . , f(n — l)) G T be an initial segment, output by M with 
probability r and let r 1; . . . , r s be the probabilities of its possible extensions (/(0), . . . , f{n)). 
Then, the number of programs hi, . . ., h vk outputting the segment (/(0), . . . , f(n — 1)) is pk 
if n — m + 1 and g'(r)k if n > to + 1. The number of programs outputting its extensions 
(f (0),..., f(n)) is g'(n)k, ...,g'(r s )k. 

For the n > to + 1 case, it suffices to show that S*=i fi , '( r «) < fi , '( r )> whenever X)*=i r i < r - 
The n = m + 1 case follows from the n > m + 1 case, once we prove g'{x) < po- We now 
proceed to show those two results. 

Lemma 26 //E*=i r » ^ r > ^en X)* =1 ^'( r i) - 5 , '( r )- 

Proof. Immediate from the definition of □ 
Lemma 27 g'(x) < p 

Proof. We need to prove that g{r\) + . . . + g{r rn ) < po, whenever r\ + . . . + r m < x and 
r i) • • • ) r m > 0. Let y\ be the value of y' in the calculation of g{ri). 

We claim that the tuple (y' 1 , y' 2 , ... , y^) is (x, rr)-allowed. To prove that, we need to show 
X)'™ i x + 4 — 1 < x. This is true because, y', > -t— 2 -, — and, therefore 

X + — - 1 < X + — - 1 = Ti, 

J/i x/(l-x + rj) 

implying yj™ x x + 4 — 1 < Y^Li < Since the tuple is (x, x) allowed, applying rule 2 to 
it generates p > x. Since p is the smallest element of A satisfying p Q > x, this also means 
P > Po- 
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Consider the application of rule 2 to y[, y f 2 , . . . , y' m . Consider the values of qi, . . . , q m in 
this application. We have g , + ^_ p = Hi which is equivalent to — p + ^ — 1. Since p > p , 
we have 

Qi > (p - Po) + Po + -7 - 1 = (p - Po) + ff(ri). 
Summing over all i gives 

X] 9(n) < ft - m(po -p)=P~ m(p - p ) 
i=i i=i 

Since p < p, we have p — po > and the equation above is at most p — (p — po) = po- This 
proves the lemma. □ 

The size of L. We show how to select the size of the team L so that be it will able to 
perform all described simulations. Two conditions must be satisfied: 

1. When the machines of the team split, the amount of machines saying that f(m) = di 
must be integer for any di i.e., g'(r)k must be integer in all cases. 

2. When the simulation algorithm for the success ratio po uses another simulation algo- 
rithm (with the ratio of successful machines p' > po) , a certain team size k' is required 
for simulation with [p'k', fc']PFIN-team. The amount of machines participating in this 
simulation (when it is used as the subroutine of the simulation for the ratio po) must 
be multiple of k! . 

For the first condition, notice that g'(r)k is, by definition, a sum of g(r)k for smaller r. 
Therefore, it suffices to choose k so that g(r)k is an integer. By definition, g(r) = po + y — 1 
where y' is belongs to a finite set P' . Since P' C A and A is the subset of rational numbers 
(section 3.4), this means that k must be chosen so that g(r)k is an integer for finitely many 
rationals g(r). Each of those requirements is equivalent to requiring that the denominator 
of g(r) divides k. 

The second condition is equivalent to: 

1. For all p' G P', the team size (1 — po + g(r))k = yk must be a multiple of hi where ki 
is the size of the team with the success ratio p' . 

2. (1 — po)k must be a multiple of k , the size of the simulation team with the success 
ratio p' Q . 

Overall, we have finitely many requirements. Each of them requires that the team size 
is a multiple of some finite number of integers k±, . . . , k m . If we select the size k so, the 
simulation algorithm will be able to perform all required simulations. □ 

Theorem 7 implies 
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Corollary 4 Let x, y G [0, 1] and x < y. If there is no p G A satisfying x < p < y, then 

PFIN(x) = PFIN(y). 

Proof. Any machine which succeeds with probability y, succeeds with probability x < y, 
too. Hence, it suffices to prove that any machine with the probability of success x can be 
simulated by a machine with the probability of success y, i.e. 

PFIN(x) C PFIN(y). 

Let p be the smallest element of A which is greater than x. Theorem 7 implies 

PFIN(x) C [pk,k]PFm C PFIN(p). 

We have y < p and, hence, 

PFIN(p) C PFIN(y) 
PFIN(x) C PFIN(y) 

□ 

So, if Theorem 4 does not prove that the power of learning machines with probabilities 
x and y is different, then these probabilities are equivalent. Hence, 

Theorem 8 A is the probability hierarchy for probabilistic PFlN-type learning in the range 
[0,1]. 

Proof. Follows from Theorem 4 and Corollary 4. □ 
Theorem 8 has a following important corollary. 

Theorem 9 Probabilistic PFIN -type learning probability structure is decidable i.e. there 
is an algorithm that receives as input two probabilities p 1 and p 2 and computes whether 
PFIN(pi) = PFIN(p 2 ). 

Proof. Use the algorithm of Lemma 1 to find the intervals [A(pi), /2(pi)] and [/i(p2), hiPz)]- 
If these two intervals are equal, PFIN(pi) = PFIN(p 2 ). Otherwise, PFIN(pi) ^ PFIN(p 2 )- 
□ 

4 Relative complexity 

From Theorem 5 we know that PFIN-type probability hierarchy is well-ordered. A question 
appears: what is the ordering type of this hierarchy? To what particular ordinal is it order- 
isomorphic? We analyze the proof of Theorem 5 step by step. 

Let a(x) denote the ordering type of ^4n]x, 1] for x < | and the ordering type of An [x, 1] 
for x > \. If x > y, then a(x) < a(y) because ^4n]x, 1] C ^4fl]y, 1]. We will often use this 
inequality. 
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Lemma 28 a{\) = 00. 

Proof. «4n]|, 1] consists of a single sequence 1, 2/3, 3/5, . . .[14]. □ 

First, we prove lower bounds on the ordering type of A. l{p) is the largest ordinal a such 
that there is an ^"-sequence in »4n]p, 1] which converges to p. We define l(p) = if there is 
no such sequence for any a. 

It is easy to see that a(p) > uo l<yP \ However, there may be a large gap between these two 
ordinals. For example, if A.n]p, 1] has the ordering type + 1, there is no infinite monotonic 
sequence converging to p and l{p) = 0. We use the function I to prove lower bounds. 

Lemma 29 

Proof. Transfinite induction over p G A. 

Base Case. Let p — 1. The ordering type of A fl [1, 1] = {1} is 1. The ordering type of 
.An] 1/2, 1] is to and 1(1/2) = 1. 
Inductive Case. Consider two cases: 

1. p is a successor element. 

Let p G [^, ^zj;]- Let r denote the element immediately preceding p. We have a(p) = 
a(r) + 1 because p is the only element of Ad [p, 1] which does not belong to Ad [r, 1]. 
By inductive assumption, l(j^) > a(r). 

Consider the splitting of [^zi, ^] in the proof of Theorem 5 (subsection 3.5.1). In the 
first step, one of segments is [3^, y^] because \p,r] does not contain other elements 
of A. We consider the sequence r , r 1; ... corresponding to [y^, y^]. 

Claim 29.1 l(n) > a(r). 
Proof. By induction. 

Base Case. Let % — 0. Then, tq = and l(r ) = l(jr^) > a(r). 

Inductive Case. We prove /(r i+ i) > /(rj). Then/(rj +1 ) > ct(r) follows from /(rj) > a(r). 
We use 

Claim 29.2 If a set is obtained from u a by removing a proper initial segment, it still 
has ordering type u a . 
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Proof. If we remove a segment with ordering type (3, we obtain the set with ordering 
type u a - (3 (Definition 3). We have cu a - (3 = cu a for all f3 < cu a . □ 

Let 

/(*) = 

x p 

f(x) maps x G A to the number generated from x and p by rule 2(Lemma 2). Let xq 
be such that f(xo) = rj. The function / maps (xo,ri) to (r^rj+i). (rj+i = /(n) by the 
definition of r i+1 .) 

We take an u; a<r ) sequence converging to r, and remove all x < x from it. The 
ordering type of the remaining sequence is still uj a ^ (Claim 29.2). / maps it to a 
sequence converging to r i+1 and preserves the ordering. Hence, l(r i+ i) > a(r). □ 

We take the union of cu a ^ sequences converging to ro, ri, ... and obtain a oj a ^ +1 
sequence converging to lim^oorj = y^. Hence, > ct( r ) + 1 = o;(p). 

2. p is a limit element. 

Let Po,Pi, ... be a decreasing sequence converging to p. Then, a(p) = lim^oo a(pi). 
We take the union of sequences converging to tj-. It has the ordering type 

lim u a{n) = cj lim — a ^ = oo a(p) 

and converges to y^. Hence, l(j^) > a ip)- 

□ 

Lemma 30 a (y^) > uj a(p) . 

Proof. Follows from Lemma 29 and a(-^) > uj 1(sT +p\ □ 

The upper bound proof is more complicated. We prove a counterpart of Lemma 30. 

Lemma 31 a (y^) < . 

Proof. Transfinite induction over p E A. 

Base Case. Let p — 1. The ordering type of Ad [1, 1] is 1 and the ordering type of Ad [1/2, 1] 
is IQ. 

Inductive Case. Consider two cases: 

1. p is a successor element. 

Let r be the element immediately preceding p. We split the interval [yqj^;, y^] into 



subintervals [ri,r ], [72, ri], • • •, as in section 3.5.1. 
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Claim 31.1 a{n) < Ci w a ^ for some o; G IN. 



Proof. By induction. 

Base Case. If % — 0, r = anc ^ a (i+^) — w °' r ' by inductive assumption. 

Inductive Case. Let P be a (r i+ i, rj +1 )-minimal set (section 3.5.4). Let A(pi, . . . ,p s ) 
denote the set of all x G ^4n]rj+i, generated by applications of the rule 2 to p[ G A, 
. . ., p' s G A such that p\ < p[, . . ., p s < p' s . a'(pi, . . . ,p s ) denotes the ordering type of 
A(p 1} . . . ,p s ). 

Claim 31.2 

a(r i+ i) < a(n)(+) J2 a'(pi, ■ ■ ■ ,p s )- 
(j>i,...,p s )eP 

Proof. We have 

An]r i+1: 1] = (An]r i: 1]) U (J A( Pu . . . , Ps ). 

( P i,...,Ps)eP 

By Lemma 1, the ordering type of «4n]rj + i, 1] is less than or equal to the natural sum 
of the ordering types of -4n]rj, 1] and all A(pi, . . . ,p s ). □ 

Next, we bound each a'(pi, . . . ,p s ). We start with an auxiliary lemma. 

Claim 31.3 If p G A follows from an application of the rule 2 to p\ G A, . . ., p s G A, 

then 

a(p) > a( Pl )(+)...(+)a(p s ). 



Proof. Without the loss of generality, we assume that pi < p 2 < ... < p s - Then, 
> ct(p 2 ) > ■ ■ ■ > «(p s ). We prove the lemma by transfmite induction over p 1 . 

Base Case, pi is the maximum element, i.e. p\ — 1. 

Then, p x = ... = p s = 1. An application of the rule 2 to p±, . . ., p s generates 
p = s/(2s - 1). 



An 



2s- 1 



s- 1 



'3' 



,2s- 1' 2s-3 7 

The ordering type of this set is s, i.e. a(p) = s. On the other hand, a(p±) = . . . = 
a(p s ) = 1 and 

a(pi)(+) • • • (+)a(p 8 ) = s. 



Inductive Case. We have two possibilities: 
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(a) pi is a successor element. 

j denotes the maximum number such that p\ — . . . — pj. Let 

/ _ / predecessor of p± , if i < j 

if ^ > J ' 

We have a{pi) = cx(p'i) + 1 for i < j and a(pi) = for i > j. Hence, 

a( Pl )(+)...(+)a( Ps ) = 

(a(p[) + 1)(+) . . . (+)(a(pj.) + l)(+)a(p' J+1 )(+) . . . (+)a(p' s ) = 

a(p' 1 )(+)...(+)a(p' s )+j. 

Let xo be the number generated by an application of the rule 2 to p[, . . ., p' s and 
Xi, fori G {1, . . . , j}, be the number generated by an application of the rule 2 to 
Pi, . . ., pi, p' i+1 , . . ., p' s . By inductive assumption, 

a(x )>a(p' l )(+)...(+)a(p' s ). 

We have Pi < p\ for i < j. By Lemma 4, Xi < Hence, 

a(xi) > ct(xj_i) + 1. 

We have Pi = p\ for i > j. Hence, Xj = p and 

a(p) = a(xj) > a(x )+j > a(p[)(+) . . . (+)a(p' s ) + j = 

a (Pl)(+) ■ ■ ■ (+) a (Ps)- 

(b) pi is a limit element. 

Again, j is the maximum number such that p\ = ... = pj. Let p^p'2, ... be a 
monotonically decreasing sequence converging to p\. Without the loss of gen- 
erality, we can assume that all elements of p'^p'^ ■ ■ ■ are less than or equal to 
Pj+i- (Otherwise, just remove the elements that are larger than pj + i and use the 
sequence consisting of remaining elements.) 

Let Xi be the number generated by an application of the rule 2 to p'i, . . ., p\, Pj+i, 
. . ., p s . By inductive assumption, 

afa) > • • • • • • (+)<*(?.)■ (3) 

S v ' 

j times 

We have p\ — . . . — pj — limj^oop^. By Lemma 5, p — lim^ooXj. Hence, if we 
take i — > 00 in (3) and apply the fact that (+) is continuous, we get 

a(p) > a(pi)(+) • • • (+)a(p s ). 
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□ 

Claim 31.4 Let (pi,...,p s ) G P. Then 

a'(pi, . . . ,p s ) < const uj a ( r \ 

Proof. Lemma 2 implies that ct'(pi, ■ ■ ■ ,p s ) is at most the natural product of a(pi), 
. . ., a(p s ). Let otj be the largest ordinal such that a(pj) > u aj . Then, a(pj) < CjUJ a] . 
(If there is no such Cj, then a(pj) < lim^oo cuj°- ] = u aj+1 and otj is not the largest 
ordinal with this property.) Hence, 

a'ipu ...,p s )< cnj ai {-)c 2 uo a2 . . . c s u a ° = (cic 2 . . . Ca ) £<; «i(+)« 2 (+)-(+)«. . 
Let p'j be such that p'j G A and a(p'j) = otj. We have a(p'j) = ctj < a{r) because 
^ < a{pj) < air,) < const cu a(r) < u) a{r)+ \ 

where ct(pj) < a(ri) follows from pj > r«. a(p'j) = ctj < a(r) implies^- > r. Therefore, 

p' 

both Lemma 29 and Lemma 31 are true for p = p\. This means that et(-rr-r) = oo aj ■ 
Hence, -^r > pj because ct(pj) > oo aj . 

Let p' be the number generated by an application of the rule 2 to p[, . . ., p' s . By Lemma 

6, ^7 is generated by an application of the rule 2 to jf^r, • • ., y+y ■ 1+^7 ls greater 

than or equal to the number generated by an application of rule 2 to pi, . . ., p s because 
p' 

ttt > Pj- This number is at least r i+1 because the tuple (p±, . . . ,p s ) belongs to the 
(r i+ i,r i+ i)-allowed set P. Hence, > r i+1 . We have ^ > ^ because [^, j^] 
does not contain any points of type y^y with p' G A. This implies p' > r. 
By Claim 31.3, 

a(f) > a( Pl )(+)...(+)a(p s ). 

This implies 

a'(pi, • • • , Ps) < (ci ■ ■ ■ Ci )w a(pi)(+) - (+)a(p<) < 
(ci...c>°^ < ( Cl ...c s K«. 

□ 

Now, we are ready to finish the proof of Claim 31.1. By Claim 31.2, ct(r i+1 ) is less than 
or equal to the natural sum of a(r,i) and a'(pi, . . . ,p s ). We have a(rj) < const a;"^ by 
inductive assumption and 

. . . ,p s ) < constuj 01 ^ 
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by Claim 31.4. Hence, the natural sum of these ordinals is at most const uo a<yT \ too. □ 
-?—) = lim a(n) < lim Cl cu a(r) < lim uj ■ uj a(r) = uo a{r)+l = uj a{p) . 

1 -\- p I i—*oo i^oo i^oo 

2. p is a limit element. 

Let p ,pi,... be a decreasing sequence converging to p. By inductive assumption, 
a(-j^) < u a ( pi \ We have 

a ( JLJ\ = li m a ( -EL-) < li m ^"(pO = w lim ™«(p-) = ^(p). 

\ 1 + p/ \1+Pij i^oo 

□ 

Lemma 32 a(^) =cj q ^. 

Proof. Follows from Lemmas 30 and 31. □ 

Theorem 10 The ordering type of A is at least €q. 

Proof. The ordering type of A D (|, 1] is uj (Lemma 28). The ordering type of A D (|, 1] is 
uf (Lemma 32 with p — 1/2), the ordering type of A D (|, 1] uj^ and so on. 
The ordering type of A is the limit of this sequence, i.e. 

e = lim(u;, of, oj^ , uj^ ,•••)• 

□ 

It is known that the ordinal eo expresses the set of all expressions possible in first-order 
arithmetic. We see that PFIN, a very simple learning criterion, generates a very complex 
probability hierarchy. 

The table below shows how the complexity of the hierarchy increases. All results in this 
table can be obtained using Lemma 32. 



Interval 


Ordering type of the probability hierarchy 




il] 














3u 


[ii] 


LU 2 














[ii] 




[o, i] 


eo 
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It shows that the known part of hierarchy ([|, 1]) is very simple compared to the entire 
hierarchy. 

Notes. The points of the probability hierarchy in the intervals [|, 1], [|, 1] and [|, 1] were 
explicitly described in [14], [12] and [9], respectively. 

In [9], an u 2 sequence of points converging to | was presented and it was conjectured 
that this sequence forms the backbone of the learning capabilities in the interval [|, 1]. 

5 Probabilistic versus team learning 

For EX-identification, there is a precise correspondence between probabilistic and team learn- 
ers (Pitt's connection [26]). Any probabilistic learner can be simulated by any team with the 
ratio of successful machines equal to the probability of success for the probabilistic learner. 

However, the situation is more complicated for finite learning (FIN and PFIN). Here, 
the learning power of a team depends not only on the ratio of successful machines. Team 
size is also important. 

Theorem 11 [32, 19] [1,2]PFIN C [2,4]PFIN. 

So, a team of 4 learning machines where 2 machines are required to be successful has 
more learning power than team of 2 learning machines where 1 must succeed. However, in 
both teams the ratio of successful machines to all machines is the same(|). 

This phenomena is called redundancy. Various redundancy types have been discovered 
for various ratios of successful machines [13, 9, 19]. The theorem below is the example of 
infinite redundancy [9, 13]. 

Theorem 12 [9] It k mod 3^0, then 

[2k, 5A;]PFIN C [8k, 20fc]PFIN. 

In particular, 

[2, 5]PFIN C [8, 20]PFIN C [32, 80]PFIN C . . . . 

So, for the ratio of successful machines 2/5 there are infinitely many different team sizes 
with different learning power. 

However, even for PFIN, any probabilistic machine can be simulated by a team with the 
same ratio of success, if we choose the team size carefully. A simple corollary of Theorem 7 
is 

Corollary 5 Ifp,qE N + 7 then there exists k such that 

PFIN(-) = [pk,qk]PFW. 
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This shows that probabilistic PFIN-learning and team PFIN-learning are of the same 
power. 

Corollary 6 Ifp,q<E W + 7 then there exists k such that 

\pl,ql]PFm C [pA;,gA;]PFIN 

for any I G 1N + . 

Proof. The team of ql machines can be simulated by single probabilistic machine which 
equiprobably chooses one of machines in team and simulates it. Hence, Corollary 5 implies 
that 

\pl,ql]PFIN C PFIN(-) = [pk,qk]PFIN. 

□ 

So, we see that redundancy structures can be very complicated but always there is the 
"best" team size such that team of this size can simulate any other team with the same ratio 
of successful machines. It exists even if there are infinitely many team sizes with different 
learning power (like for ratio 2/5, Theorem 12). 

6 Conclusion 

We have investigated the structure of probability hierarchy for PFIN-type learning. Instead 
of trying to determine the exact points at which the learning capabilities change, we focused 
on the structural properties of the hierarchy. 

We have developed a universal diagonalization algorithm (Theorem 4) and a universal 
simulation algorithm (Theorem 7). These algorithms are very general forms of diagonaliza- 
tion and simulation arguments used for probabilistic PFIN [9, 12]. 

Universal diagonalization theorem gives the method that can be used to obtain any 
possible diagonalization for probabilistic PFIN. Universal simulation algorithm can be used 
for any possible simulation. 

These two results together give us a recursive description of the set of points A at which 
the learning capabilities are different. 

This set is well-ordered in decreasing ordering. (This property is essential to the proof 
of Theorem 7.) Its structure is quite complicated. Namely, its ordering type is eo, the 
ordering-type of the set of all expressions possible in first-order arithmetic. 

It shows the huge complexity of the probabilistic PFIN-hierarchy and explains why it is 
so difficult to find the points at which the learning capabilities are different. 

A simple corollary of our results is that the probabilistic and team PFIN-type learning is 
of the same power, i.e. any probabilistic learning machine can be simulated by a team with 
the same success ratio. 

Several open problems remain: 
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1. Unrestricted finite learning(FIN). 

The major open problem is the generalization of our results for other learning paradigms 
such as (non-Popperian) FIN-type learning and language learning in the limit. 

Theorem 4 can be proved for (nonPopperian) FIN-type learning, too. Hence, if 

PFIN( Pl ) ^PFIN(p 2 ), 

then 

FIN( Pl ) ^FIN(p 2 ). 

So, the probability hierarchy of FIN is at least as complicated as the probability hier- 
archy of PFIN. It is even more complicated because it is known [11, 12] that 

FIN (24/49) C FIN (1/2) 

but 

PFIN(24/49) = PFIN(l/2). 

The simulation techniques for FIN are much more complicated than simulation tech- 
niques for PFIN. However, we hope that some combination of our methods and other 
ideas (e.g. [11, 10]) can help to identify the set of all possible diagonalization methods 
for FIN and to prove that no other diagonalization methods exists (i.e. to construct 
universal simulation for FIN). 

A step in that direction was made in [3] by proving that FIN-hierarchy is well-ordered 
and recursively enumerable. It still remains open whether it is decidable. The proof 
technique in [3] is different from ours and uses capability trees [10]. 

2. Probabilistic language learning. 

The probability hierarchy of language learning in the limit [17] has some similarities to 
FIN and PFIN-hierarchies. 

It is an interesting open problem whether some analogues of our results can be obtained 
for language learning in the limit. 

3. What is the computational complexity of decision algorithms for PFIN-hierarchy? 

4. How dense is the probability hierarchy? 

Can we prove the result of the following type: 
If PuP* e [^,i] and \ Pl -p 2 \ < (1/2)", then 

PFIN(pi) ^ PFIN(p 2 )? 
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Other properties of the whole hierarchy can be studied, too. 
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